Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove combo scaling from Aim and Speed from osu! performance calculation #16280

Open
wants to merge 43 commits into
base: master
Choose a base branch
from

Conversation

apollo-dw
Copy link
Member

@apollo-dw apollo-dw commented Dec 29, 2021

This proposal removes combo scaling from the Aim and Speed skills in osu!, and overhauls their miss penalties. Currently, the performance calculator does not know where the player missed in a map - for example whether it was on the hardest section or an easier section. Therefore, the most fair way to treat misses is to assume they are equal across the map, i.e. 500x/1000x 1 miss vs. 750x/1000x 1 miss should be weighted similarly.

Misses, in general, have been made harsher. The initial miss is going to have a harsher penalty to differentiate between FCs and non-FCs. The scaling is no longer based on the total object count, but instead the number of relevant difficult strains. This is measured by counting the number of strains, dividing them by the top strain and raising to power 4. This results in maps with consistent difficulty being more lenient, and long maps with a short spike in difficulty being treated similarly to shorter maps.

There have been concerns about consistency supposedly mattering less, and I don't think that's the case. Miss counts are a fine metric to rely on here, and again, using combo here doesn't make much sense if we can't tell what parts of the map are hard from performance side, or where the player has missed. I hope that the strain count stuff described in the miss penalty change above can at least rest some of the concerns :P

Values (and a very rough Q&A thingy) can be found at https://pp.huismetbenen.nl/rankings/players/apollo_visual

int speedDifficultyStrainCount = ((OsuStrainSkill)skills[2]).RelevantDifficultStrains();

// Total number of strains in a map can vary by clockrate, and this needs to be corrected for.
aimDifficultyStrainCount = (int)(aimDifficultyStrainCount * clockRate);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make RelevantDifficultStrains have clock rate as param instead? So that it'll be ((OsuStrainSkill)skills[0]).RelevantDifficultStrains(clockRate); and public int RelevantDifficultStrains(double clockRate)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, good idea.


private double calculateMissPenalty(double missCount, double strainCount)
{
return 0.95 / ((missCount / (3 * Math.Sqrt(strainCount))) + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably need some comments

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really sure what I could comment here, and I don't think it'd end up being very helpful anyway? have you got anything specific in mind

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something about why exactly you need difficult strains here would be good I think. Basically just explain the idea so that people that have no idea what's going on could understand the motivation for this penalty

@@ -57,5 +57,12 @@ public override double DifficultyValue()

return difficulty * DifficultyMultiplier;
}

public int RelevantDifficultStrains()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Xmldoc?

aimValue *= 0.97 * Math.Pow(1 - Math.Pow((double)effectiveMissCount / totalHits, 0.775), effectiveMissCount);

aimValue *= getComboScalingFactor();
aimValue *= calculateMissPenalty(effectiveMissCount, Attributes.AimDifficultStrainCount);
Copy link

@SensouShin SensouShin Dec 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should calculate miss penalty only once and save it into a variable in my opinion.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the penalties for aim and speed are different because AimDifficultStrainCount and SpeedDifficultStrainCount are different, so there is no repetition...

@PercyDan54
Copy link
Contributor

PercyDan54 commented Dec 30, 2021

I don't think this makes much sense
It seems farming pp maps higher than your skill level is made easier
Very subjective opinion maybe

image

@apollo-dw
Copy link
Member Author

apollo-dw commented Dec 31, 2021

I don't think this makes much sense It seems farming pp maps higher than your skill level is made easier Very objective opinion maybe

just for clarity, here are the effects the miss penalty are having on the maps here compared to their FC values (no misses):

make a move 163.4 -> 121.9 (-41.5, 74.6%)
hikari 176.6 -> 119.9 (-56.7, 67.9%)
stay 154.5 -> 110.9 (-43.6, 71.8%)

these plays are atleast a full star above one of the FCs there (make a move is 6.11*, and harumachi clover expert is 4.74*), so you can make a judgement there whether those plays with misses are a similar skill level to the harumachi clover FC. also worth noting that farm maps will be farm either way :P

@cihe13375
Copy link

Here's a passer-by not really familiar with performance calculation algorithms, but here are some potential issues I see (just in case they're missed):

(1) How would sb counts be evaluated? It seems that you use effectiveMissCount, but from what I see it returns the minimum number of combobreaks possibie (if > number of misses), which may severely underestimate sb counts on slider heavy maps (e.g. I may have a score with 550/1000 combo, 0xmiss and 10xsb, but the effectiveMissCount returns 1). It might be possible to estimate sb count by number of 100s, but it's another story to estimate how many 100s are originated from sb...

(2) Following (1), because the effectiveMissCount is not a continuous function of maxcombo, the pp value may change abruptly (~10% on several maps I tried) with MaxCombo changing by ±1 (again, assuming 0xmiss). This may cause some confusion and also not really good imo (the miss penalty was usually buried in combo scaling before so it wasn't obvious like this)

(3) A hard threshold (66%) is used in calculating the strainCount, I'm not sure how "hackable" is this (like it might be possible to create a map with most strain values locating around 70% of maximum strain, and it will be considerably overrated; similar for being underrated). A continuous weighting curve on this would be safer, though idk if it's difficult to implement

Anyway it's great to see scores with unlucky combobreaks in the middle getting what they deserve. Good job!

@aticie
Copy link

aticie commented Jan 3, 2022

(1) How would sb counts be evaluated? It seems that you use effectiveMissCount, but from what I see it returns the minimum number of combobreaks possibie (if > number of misses), which may severely underestimate sb counts on slider heavy maps (e.g. I may have a score with 550/1000 combo, 0xmiss and 10xsb, but the effectiveMissCount returns 1). It might be possible to estimate sb count by number of 100s, but it's another story to estimate how many 100s are originated from sb...

(2) Following (1), because the effectiveMissCount is not a continuous function of maxcombo, the pp value may change abruptly (~10% on several maps I tried) with MaxCombo changing by ±1 (again, assuming 0xmiss). This may cause some confusion and also not really good imo (the miss penalty was usually buried in combo scaling before so it wasn't obvious like this)

These two issues are out of the scope of this pull request. This pr is for removing the combo scaling. The issues you mentioned are already present in the live version, this pr is not bringing them in.

@peppy peppy marked this pull request as draft June 16, 2023 06:09
@apollo-dw apollo-dw marked this pull request as ready for review May 23, 2024 06:10
@apollo-dw apollo-dw requested a review from a team May 23, 2024 06:10
Copy link
Sponsor Member

@tsunyoku tsunyoku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

Comment on lines +51 to +52
double aimDifficultyStrainCount = ((OsuStrainSkill)skills[0]).CountDifficultStrains();
double speedDifficultyStrainCount = ((OsuStrainSkill)skills[2]).CountDifficultStrains();
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point I'd really like to rid of the indexing of skills because it makes me anxious but future problem..

Copy link
Member

@stanriders stanriders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The time has come.

@stanriders
Copy link
Member

@smoogipoo this PR is considered approved by osu pp committee and is ready for merge

@tsunyoku
Copy link
Sponsor Member

@ppy/team-client pinging on request of bdach (sorry for double ping smoogi, not intended) to make it clear these changes are ready but also to provide some direction.

This will require 2 new difficulty attributes as a bare minimum to deploy for lazer, I don't know if it's viable to add any new attributes to the database but they're quite small so I'm hoping that it might be possible even if just for this rework.

If the added storage requirements are not negotiable then it will have to wait for real time difficulty calculation since there's no way to proceed without the new attributes.

To deploy to stable, it's going to require that the score performance processor in osu-queue-score-statistics is in use to prevent the scenario where a lower pp score will overwrite since the old infrastructure is overwriting based on score. I know this isn't new information, I'm just reminding.

I'm hoping with this rework being approved alongside a few other reworks being stacked up for review that focus on solving pp infra concerns and eventually running recalculations will be higher priority than so far. This rework is pretty huge in value and in general is resulting in large changes for a lot of users, so it would be great if we could see it deployed and recalculated sooner rather than later. I'm highly aware the infra work is no small task but it would be nice to see it worked on!

It'd be nice if we could get some time estimation on how viable it is for these changes to be deployed and recalculated, as if it's going to take a while then we may focus on getting a few other smaller reworks merged in if a recalculation isn't viable soon - equally, I'm keen to push specifically these changes to deploy & recalculation as soon as realistically possible.

@smoogipoo
Copy link
Contributor

to prevent the scenario where a lower pp score will overwrite since the old infrastructure is overwriting based on score

I think this is the blocker for the time being. @peppy and @bdach know better than me at this point the process by which scores are un-preserved and deleted.

@bdach
Copy link
Collaborator

bdach commented May 28, 2024

Deployment considerations as far as I can tell:

  • Two new difficulty attributes per beatmap/mod combo. This is:
    • 2 rows * 13 bytes per row
    • for 54 mod combos for each beatmap
    • for ~111700 ranked beatmaps (as per data.ppy.sh dumps)
    • totalling ~156 MB of extra storage overhead
    • plus the need to do a full run of osu! server-side diffcalc for all beatmaps
  • For the "score overwriting" situation:
    • lazer scores are in a better situation since right now - to the best of my knowledge - marking scores as non-preserved and deleting them still isn't online, and even if it was, it'd be doing the right thing.
    • stable is indeed the problem as adding highest pp to score preservation criteria probably requires changes to web-10 / osu-performance for quickest outcome (lazer infra just imports across whatever web-10 decides to preserve). Or just shutting off osu-performance with all of what that would involve. Would need @peppy assessment on feasibility and/or willingness to do this.
    • Note that stable is already using the osu-queue-score-statistics processor for user total PP which means that it correctly handles multiple preserved plays correctly (it'll pick the highest PP one, not the highest score one). To make this work for stable we actually just need the highest-PP-but-not-highest-score score to just not disappear at the web-10 end.

@@ -25,6 +25,8 @@ public class DifficultyAttributes
protected const int ATTRIB_ID_SCORE_MULTIPLIER = 15;
protected const int ATTRIB_ID_FLASHLIGHT = 17;
protected const int ATTRIB_ID_SLIDER_FACTOR = 19;
protected const int ATTRIB_ID_AIM_DIFFICULT_STRAIN_COUNT = 21;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really intended to be duplicate of ATTRIB_ID_SPEED_NOTE_COUNT ?

Copy link
Collaborator

@bdach bdach Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good (and very scary) catch. I can see how this could have happened, the speed note count thing was added in #15035 which precedes the open date of this PR.

That said I'm not sure I would trust any sheet results at this point given this revelation because I'm not sure what this means for correct calculation. Probably needs a full diffcalc re-run and re-verification of results after fixing.

Copy link
Sponsor Member

@tsunyoku tsunyoku Jun 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh dear.. luckily no sheets were generated for the latest changes. osu-tools isn't using these attribute IDs and the huismetbenen website that is used by users also doesn't use these so I'm not concerned about the values - but obviously, this needs to be fixed before anything which does use these can be relied upon.

@peppy
Copy link
Sponsor Member

peppy commented Oct 3, 2024

What's going on here? This still has conflicts. Should I look to resolve them or is someone going to handle that?

@bdach
Copy link
Collaborator

bdach commented Oct 3, 2024

To play devil's advocate these conflicts 100% weren't there before. They're probably from #29291.

@tsunyoku
Copy link
Sponsor Member

tsunyoku commented Oct 3, 2024

Yeah, those conflicts were fresh from recent refactors (not the first time these refactors have caused me issues 🫠). I've resolved them now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Pending Review
Development

Successfully merging this pull request may close these issues.